Genomic studies of the additive and dominant genetic control on production traits of Euterpe edulis fruits

In forest genetic improvement programs for non-domesticated species, limited knowledge of kinship can compromise or make the estimation of variance components and genetic parameters of traits of interest unfeasible. We used mixed models and genomics (in the latter, considering additive and non-additive effects) to evaluate the genetic architecture of 12 traits in juçaizeiro for fruit production. A population of 275 genotypes without genetic relationship knowledge was phenotyped over three years and genotyped by whole genome SNP markers. We have verified superiority in the quality of the fits, the prediction accuracy for unbalanced data, and the possibility of unfolding the genetic effects into their additive and non-additive terms in the genomic models. Estimates of the variance components and genetic parameters obtained by the additive models may be overestimated since, when considering the dominance effect in the model, there are substantial reductions in them. The number of bunches, fresh fruit mass of bunch, rachis length, fresh mass of 25 fruits, and amount of pulp were strongly influenced by the dominance effect, showing that genomic models with such effect should be considered for these traits, which may result in selective improvements by being able to return more accurate genomic breeding values. The present study reveals the additive and non-additive genetic control of the evaluated traits and highlights the importance of genomic information-based approaches for populations without knowledge of kinship and experimental design. Our findings underscore the critical role of genomic data in elucidating the genetic control architecture of quantitative traits, thereby providing crucial insights for driving species' genetic improvement.

www.nature.com/scientificreports/ A fruit sample of each genotype was obtained, packed in properly identified plastic bags, and sent to the Plant Biometry Laboratory at the Federal University of Espírito Santo, where morphometric evaluations of fruit and seeds were carried out following a completely randomized design.
The weight of 25 fruits (g) (WF) and the weight of 25 seeds (g) (WS) were evaluated in grams (g) in four replications per genotype. With this information, the difference between the morphometric traits of fruits (WF) and the morphometric traits of seeds (WS) resulted in the amount of pulp in 25 fruits (g) (AP) in grams. The following formula was used to obtain the pulp yield (%) (PY): where AP is the amount of pulp and WF is the fresh mass of 25 fruits.
To evaluate the equatorial and longitudinal diameters of fruits and seeds (EDF, LDF, EDS and LDS, respectively) (mm), five fruits were measured in millimeters (mm) using a digital caliper. Marçal et al. (2016) 31 concluded that five measurements are necessary to achieve a 95% coefficient of determination -although they provide additional information on the studies, measurements beyond this number increase cost and evaluation time.
Genotyping and quality control of molecular markers. Genomic DNA was obtained from leaf samples of the 275 genotypes following the CTAB method 32 with modifications 33 . After verifying the quantity and quality using a NanodropTM 2000 spectrophotometer (Thermo Scientific), the genomic DNA was sent to Mexico for genotyping high throughput using the DArTseqTM methodology at the Genetic Analysis Service for Agriculture (SAGA).
The sequences were analyzed using Dartsoft14, an automated analysis program of genomics data, and DArTdb, a laboratory management system, developed and patented by DArT Pvt. Ltd. (Australia), generating SNP marker data as described by Killian et al. (2012) 34 and Sansaloni et al. (2020) 22 . Additional information can be found at 35 .
The SNP markers were of the codominant type, and the molecular dataset underwent the quality control process carried out in R 36 , using a Call Rate of 90% and MAF (Minor Allele Frequency) of 5%, reducing the marker dataset by 81.75%, starting with 44,457 markers and keeping 8112 for use in the analysis.
Statistical models. Three models were employed to predict breeding values in the context of genetic improvement: BLUP and two genomic models (GBLUP). The first model considered only the additive effects (GBLUP-A), while the second considered the additive-dominant effects (GBLUP-AD).
Traditional BLUP the prediction of breeding values using the BLUP model was performed in R 36 using the SOMMER package version 3.4. The mixed linear model used was: where y is the phenotype vector; X is the fixed effect incidence matrix (year); µ is the fixed effects vector (mean and year); Z is the random effects incidence matrix considered as genotype, and a is the vector of additive genetic effects considered random with a ∼ N(0, Aσ 2 a ) , where A is the kinship matrix via pedigree. In this case, the relationship is completely unknown; hence A is an identity matrix I ; σ 2 a is the additive genetic variance, and ε is the residual vector assumed to be random ε ∼ N 0, Iσ 2 e , where σ 2 e is the residual variance.   www.nature.com/scientificreports/ Genomic BLUP (GBLUP) the GBLUP model is similar to BLUP, except for the assumed kinship matrix. So GBLUP replaces the A matrix with the genomic kinship G matrix derived from the marker information. The genomic breeding values (GEBVs) are predicted using the model below: where g is the vector of additive genetic effects considered random with g ∼ N(0, Gσ 2 g ) , where G is the additive genomic kinship matrix, and σ 2 g is the additive genetic variance. The genomic kinship matrix (G) used was proposed by VanRaden 18 as follows: where W is the centered marker matrix as presented by Vitezica et al. (2013), where W ij is the element of the ith row and jth column of the marker matrix, and p j is the allele frequency of the marked jth.

BLUP genomic dominant additive (GBLUP-AD)
where Z 1 and Z 2 are the incidence matrices of the additive and dominant genetic effects, respectively, considered random; d is the random-effect vector of dominance, where d ∼ N(0, Dσ 2 d ) , D is the genomic relationship matrix for dominance effects generated from marker information, and σ 2 d is the genetic variance of dominance. Dominance genomic kinship matrices (D) the D matrix was obtained as follows [37][38][39] : with S ij as the elements of the ith row and jth column of the S matrix, and p j as the allele frequency of the jth marker.

Accuracy of genetic values and adjustment of models.
The methodologies were compared based on the theoretical accuracy of breeding values (r), variance of the prediction error (PEV), maximum likelihood (LL), Akaike information criterion (AIC), and Bayesian information criterion (BIC).
The r estimates were estimated using the following expression: where PEV is the variance of the prediction error obtained through the diagonal elements of the mixed model matrix 40 , and σ 2 g is the genetic variance. AIC 41 was calculated as follows: where log(L) is the logarithm of the residual likelihood function (LL), and k is the number of estimated parameters.
BIC 42 can be written as: where v is the number of residual degrees of freedom.
Cross-validation and predictive ability. Predictive ability was estimated by the correlation between the predicted breeding values of each BLUP, GBLUP-A, and GBLUP-AD model ( a , g e u = g + d , respectively) and the phenotypic mean of the evaluated years of each genotype (y). To this end, we used the cross-validation procedure, subdividing the population into five folds. In total, 275 genotypes were used, 220 of which were destined to constitute the training population and 55 to constitute the validation population. The average predictive ability ( r y, u ) and the standard error were calculated with the results obtained.
Components of variance and genetic parameters. The variance components were obtained via the REML method. The genetic parameters were estimated by: is the variance related to the dominance deviation, and σ 2 e is residual variance. To verify the influence of insertion of dominance in the model, the relationship was established between σ 2 a of the GBLUP-AD model, which for the discussion of this relationship, we will call σ 2 a(d) , and the σ 2 a of the model GBLUP-A.
Furthermore, to verify the magnitude of the dominance effect as a function of the total variance, a relationship was established between σ 2 d and σ 2 f of the GBLUP-AD model:

Statement of research involving plants.
The experimental research and field studies on plants-Our studied population complies with pertinent legislation. The plant samples were collected in a commercially managed juçaizeiro plantation in the municipality of Rio Novo do Sul in the state of Espírito Santo-Brazil, originating from the process of natural recomposition and seed launch in a pasture area. We had the owners' authorization for all data collected and received support from the companies Açai Juçara and Bonaloti for the research development. We are so grateful to Pedro and Vicente Bortoloti and their families, the owners of managed area.
We also confirm that all methods were carried out in accordance with relevant guidelines.

Results
Summary of phenotypic data. For all traits, we observed that the sample values evaluated in the genotypes tend to be influenced by the year of evaluation (Table 1). We also found that NBP is the trait that presents the most variable behavior depending on the year. On the other hand, the morphometric traits of fruits and seeds present the most stable phenotypic response among the different evaluated crops ( Table 1).

Estimates of variance components and genetic parameters. Estimates of variance components
and genetic parameters for each model are presented in Table 2. Except for the LDS trait, the σ 2 a estimates are lower when estimated by the GBLUP-AD model, while the σ 2 e estimates indicate small differences between the models. Consequently, the h 2 estimated by the BLUP and GBLUP-A models are generally higher than those estimated by the GBLUP-AD model, which includes both additive and dominance variance effects. The reductions in the h 2 estimates ranged from 0 (LDS) to 0.32 (FWB) between the GBLUP-A and GBLUP-AD models and from 0.03 (LDS) to 0.34 (FWB) between the BLUP and GBLUP-AD models ( Table 2).
Comparing the h 2 estimates between the BLUP and GBLUP-A models, the greatest difference was observed for the RL characteristic, which was 12.5% smaller when using the genomic model of additive effect (GBLUP-A). When the dominance effect was added to the genomic model (GBLUP-AD), the differences between the h 2 and the BLUP model intensified, reducing NBP up to 67.57%. In general, the group of traits related to fruit production in the field (NBP, FWB, and RL), except for NBP, showed the greatest changes when considering the dominance effect on the structure of the model.
Overall, H 2 demonstrated low (RL), medium (NBP, NBP, AP, PY, LDF, and LDS), to high (NRB, WF, WS, EDF, and EDS) heritability. We observed the similarity between the broad-sense heritability (H 2 ) estimates by the GBLUP-AD and the narrow-sense heritability (h 2 ) obtained by the BLUP and GBLUP-A methods. Table 3 presents the relationships due to the dominance effect for all traits studied. Except for LDS, which does not have this effect, all other traits have at least a part of their total variance explained by dominance. In this sense, the d 2 values ranged from 0.04 (PY) to 0.27 (FWB) when excluding the LDS traits. The lower values of RL aa and higher values of d 2 for NBP and FWB indicate the strong influence of dominance for these traits; we noticed that the dominance effect surpassed the additive effect since RL aa had values below 0.50 ( Table 3). Table 4 shows the estimates of PEV, r, LL, AIC, and BIC. Based on these results, genomic models GBLUP-A and GBLUP-AD provided a better fit with more accurate breeding value predictions than the conventional BLUP. This is because genomic models generally show lower estimates of PEV, close or higher estimates of theoretical accuracy, and lower values of AIC and BIC compared to conventional BLUP.

Adjustment of models and predictive capacity.
The Maximum Likelihood Ratio Test (LRT) between genomic models indicated a significant dominance effect for the traits FWB, NBP, WF, and AP, as seen in Table 3 by higher values of d 2 (0.27, 0.20, 0.20, and 0.17, respectively) and lower values of RL aa (0.32, 0.30, 0.65, and 0.60, respectively). For the other variables, the GBLUP-AD and GBLUP-A models showed no statistical differences by the LRT. However, it is worth mentioning that r had higher values for the RL trait, and AIC and BIC presented lower values for the model that considers the dominance effect, demonstrating higher performance. For NR, even with better-fit quality indicated by AIC and BIC, r and PEV for GBLUP-AD were lower. www.nature.com/scientificreports/ The lowest estimates of r were observed for LR: 0.72 (BLUP), 0.72 (GBLUP-A), and 0.74 (GBLU-AD). However, the results obtained for the other traits had a high magnitude, ranging from 0.82 (NBP) to 0.99 (WF and WS). For all traits, the predictive capabilities ( r y, u ) of BLUP were equal to zero. For the genomic methods, r y, u (Fig. 2) ranged from 0.18 (NBP) to 0.51 (WS) for GBLUP-A and from 0.23 (NBP) to 0.50 (WS) for GBLUP-AD.
Under five-fold cross-validation, a slight difference was observed between the genomic models, with the difference more pronounced for NBP. The dominance effect was more pronounced for NBP, reflecting the better performance of the GBLUP-AD model (Table 4). This is expressed by r y, u , which increased from 0.18 to 0.23 (Fig. 2).
Furthermore, except for NR, the traits related to fruit production in the field (CR, MFC, and NC) presented lower r y, u , with estimates below 0.30. The morphometric traits of fruits and seeds (MFS, MFF, REND, QP, DLF, DEF, DLS, and DES) had a higher performance for GEBVs, with r y, u above 0.35 (Fig. 2).

Discussion
Summary of phenotypic data. Assessing the productive behavior of juçaizeiro (Table 1) is of great importance for growers and stakeholders involved in the crop as it enables an understanding of the productive behavior of this plant and facilitates future estimates of output and profit. It is also worth mentioning that the experimental area evaluated received no cultivation treatments, and therefore, the productive aspects have significant potential for increasing yield by implementing management practices that lead to short-term changes. Added to this, for the crop as a whole, there is a high potential that genetic advances can be obtained in the juçaizeiro breeding programs in the coming years, which will contribute to establishing the species in the fruit sector and maximizing economic gains in fruit production.

Genomic selection and dominance effect in Juçaizeiro breeding.
In the present study, BLUP, GBLUP-A, and GBLUP-AD were evaluated to estimate the variance components, genetic parameters, prediction accuracies, and predictive abilities, with the objective of assessing the efficiency of genomic selection and genetic control of several productive traits of the juçaizeiro. This species is in the early stages of improvement, and our study aimed to provide information on its genetic control of several traits of interest to fruit production.
As this crop is only found in forest fragments with small commercial orchards, few improvement studies, which address only fruit, seed, and germination traits, have been carried out with it [43][44][45][46] . Additionally, only additive effects are incorporated into the genetic value prediction models, which are unaware of the kinship between individuals. In contrast, we utilized not only morphometric aspects but also data related to fruit yield and incorporated dominance effects into additive models through the knowledge of kinship among individuals. Therefore, this is the first study to include dominance in the genetic control study of juçaizeiro traits.
To better understand the importance of knowledge about the genetic control structure of productive traits in the juçaizeiro, some specific aspects related to species breeding must be presented. The species is currently propagated exclusively by seeds, and breeding programs primarily focus on understanding the additive genetic control for estimating and selecting the best genotypes based solely on these effects 33,45,46 , since the dominance effects will not be transmitted to the selected materials due to the propagation method used.
Although dominance is not considered in seed recommendations for commercial plantations, it plays a crucial role in the species' breeding program. Thus, a deeper understanding of the genetic control structure and the magnitude of these effects under the traits of interest is essential for decision-making by genetics breeders. By quantifying the effects that explain the observed phenotypic variation, breeders can optimize selective gains and use more effective methods.  www.nature.com/scientificreports/ Moreover, assessing genetic control and the magnitude of the effects of genetic dominance is necessary for making informed decisions about genetically complementary breeding. This understanding enables breeders to identify the most effective crosses to achieve desirable traits 47 , as crosses of the best genotypes based solely on additive effects may not provide the best progenies 48,49 . Thus, knowledge of dominance associated with genetic divergence among individuals becomes essential for promoting the capitalization of the heterotic effect through controlled crosses, seeking better results in the population of the second selection cycle.
In addition to being a crucial aspect for developing offing hybrids, understanding the dominance effect is fundamental to detecting disturbing effects in selecting superior genetic materials 50 . However, in classical breeding programs, estimating these variance components becomes unfeasible due to the requirement to conduct a large number of controlled crosses and experimental areas. Moreover, crop development time can delay the species improvement program, taking at least 10 to 15 years to estimate genetic effects such as dominance, considering that the tested materials originate from controlled crosses. Thus, genomic techniques have proven effective for Table 2. Estimates of variance components and genetic parameters estimated by the BLUP, GBLUP-A, and GBLUP-AD models, using information from genotyped trees for the traits number of bunches per plant (NBP); fruit weight per bunch (kg) (FWB); rachis length (cm) (RL); number of rachillas per bunch (NRB); weight of 25 fruits (g) (WF); weight of 25 seeds (g) (WS); amount of pulp in 25 fruits (g) (AP); pulp yield (%) (PY); equatorial diameter of fruits (mm) (EDF); longitudinal diameter of fruits (mm) (LDF); equatorial diameter of seeds (mm) (EDS); longitudinal diameter of seeds (mm) (LDS). σ 2 a : additive variance; σ 2 d : dominance variance; σ 2 e : residual variance; H 2 : heritability in the broad sense; h 2 : heritability in the narrow sense.     Table 4. Goodness-of-fit and prediction parameters for genetic values evaluated for the BLUP, GBLUP-A, and GBLUP-AD models using information from genotyped trees for the following traits: number of bunches per plant (NBP); fruit weight per bunch (kg) (FWB); rachis length (cm) (RL); number of rachillas per bunch (NRB); weight of 25 fruits (g) (WF); weight of 25 seeds (g) (WS); amount of pulp in 25 fruits (g) (AP); pulp yield (%) (PY); equatorial and longitudinal diameter of fruits and seeds (mm) (EDF, LDF, EDS, and LDS; respectively). www.nature.com/scientificreports/ the juçaizeiro by providing rapid, in-depth information about the genetic control of its traits and improving estimates of additive genetic effects. In this sense, the genomic methods applied in the present work already surpass the traditional BLUP method since it does not allow estimation of dominance effects due to the lack of pedigree information. Nevertheless, these more sophisticated genomic techniques offer advantages as they generate increased selective accuracy and allow more extensive studies of the genetic control of traits in a more immediate way, while traditional methods would require the performance of controlled crosses (prone to errors and contamination by pollen) 51 , long periods for plant development, and large experimental areas.
To compare the methodologies, the quality measures of AIC and BIC adjustments were used. Through them, we were able to conclude that the inclusion of dominance effects improves model fit for all evaluated traits, as evidenced by the smaller AIC and BIC values (Table 4). Conversely, when using the BLUP model for all evaluated traits, the goodness-of-fit measures were superior to the other models (Table 4). However, including the dominance effects may lead to negative responses in predicting some traits, as shown by the increase in PEV and, consequently, a slight reduction in r (Table 4). Including dominance effects in the model brought more noticeable improvement to the traits FWB, NBP, WF, and AP. Therefore, the r y, u values considering dominance were superior to the GBLUP-A models, indicating the importance of including this effect in the additive model for estimating individual GEBVs.
As previously mentioned, the lower r values observed for the traits generally have a high magnitude, and even though the LRT revealed the non-existence of differences among traits between the methods, the use of the dominance effect is justified for understanding genetic control of the traits.
In the present study, the differences observed between the BLUP and GBLU-A models as a function of h 2 did not have high magnitudes. This same behavior was observed in the morphological traits of pine 52 . However, in contrast to these observations, studies conducted on Eucalyptus found more pronounced changes between the estimates 14,15 . Nevertheless, in the cited studies, the drastic changes that can be achieved using marker-based models compared to pedigree information are evident; this was also observed in the present study when the dominance effect was added to the model, which can correct for overestimations in additive genetic variance, as shown by the reduction in the h 2 estimates (Table 2), agreeing as well with studies on maize 53 and eucalyptus 54 .
The similarity observed between broad-sense heritability estimates (H 2 ) obtained by GBLUP-AD and narrowsense heritability (h 2 ) obtained by GBLUP-A may be associated with the fact that the type of regularization used in the marker matrix for estimating additive effects using the VanRaden (2008) method 18 makes the additive effects of markers correspond to the allelic substitution effect, whose estimation terms contain both the additive effect of the homozygote and the dominant effect of the heterozygote α = a + d q − p , leading us to believe the dominance effect is confounded in this estimate, resulting in overestimations by this latter model. In conclusion, the σ 2 a estimated by GBLUP-A is close to the σ 2 g estimates obtained by GBLUP-AD, resulting in similar h 2 and H 2 estimates between the methods.
In addition, the use of molecular markers shows great relevance for a more in-depth investigation of the genetic control of traits and allows more realistic gains estimates. The observed broad-sense heritability (H 2 ), which shows how much of the phenotype variation is explained by the genotype, can be classified as low (RL) (< 0.30), medium (NBP, FWB, AP, PY, and LDS ) (0.30-0.60), and high (NRB, WF, WS, EDF, LDF, and EDS) (> 0.60) 55 . The results shown in Table 2 reveal that traits associated with fruit production (NBP, FWB, RL, and NRB) are more influenced by environmental effects than morphometric variables (WF, WS, AP, PY, EDF, LDF, www.nature.com/scientificreports/ EDS, and LDS). Among production-related traits, NRB stands out from the others in terms of genetic control, with a H 2 of 0.68 classified as high.
Adding dominance effects to the predictive models can increase the predictive ability of the genomic model, with the improvement determined by the proportion in which this effect can explain the total variation. This behavior can be observed mainly for the traits FWB (Table 3) and NBP (Table 4), which showed the greatest influence of dominance. Improvement in models incorporating dominance for traits with higher d 2 can also be observed in simulated data from pine populations 49 . Such improvements are fundamental for the juçaizeiro. Although not propagated clonally, removing the dominance effects from estimates of additive effects can increase selective accuracy and promote greater future gains, highlighting the relevance of genomic techniques in the studied population.
Therefore, the presented results contribute to improving the understanding of the genetic structure that governs part of the phenotypic variation of quantitative traits of productive and industrial interest. Moreover, the importance of including the dominance effect to obtain a more realistic partitioning of genetic variance in its additive and non-additive terms is evident by the reduction of h 2 for most traits.
Predictive ability. Predictive ability is a striking measure of comparison among the models tested in this study. Since conventional BLUP does not provide any pedigree information, the genetic values predicted by this method for untested materials correspond to the average; consequently, the predictive ability becomes zero. For the models based on genomic information, estimates of r y, u among the traits are distinct and associated with several factors, such as h 2 . This is further evidence of the superiority of genomic methods over the traditional method.
In general, two factors affect the predictive ability of marker-based models, including heritability, which is dependent on the genetic control of the trait, and phenotyping quality, associated with field evaluation processes 56 . We found estimates of r y, u with mean values below 0.55; Since this is an initial study applying genomic techniques in natural populations, hypotheses were raised for techniques that require complementary studies to improve the results observed for the experimental condition tested.
In this sense, it is possible that including covariates associated with plant age, like height and circumference, could improve the performance of predictions and covariates associated with the plot conditions where a group of plants is located. Additionally, the trait-assisted method of genomic prediction 57 , with or without covariates, may be an alternative to maximize predictive abilities. This method not only uses molecular marker information but also benefits from the correlated effects among the traits of individuals whose genetic value is being predicted to improve the predictions of a key trait.
In general, the values of r y, u obtained using the GBLUP-A and GBLUP-AD methods were similar, with a greater difference observed for NBP (Table 4). Such similarity associated with predictive relationships between models has been observed in the literature [58][59][60] . In this sense, Azevedo et al. (2015) 60 concluded that other parameters should be considered for model comparison. The authors determined that estimates of heritability, bias, and the relationship between variation due to dominant and additive effects are more relevant factors for such a comparison. Therefore, in the present study, as the GBLUP-AD model presented the best fitting parameters (AIC and BIC) and demonstrated that GBLUP-A may have confounded its additive term with dominance effect, overestimating h 2 , we conclude that the model that included the dominance term improved the selection of superior genetic materials, with GBLUP-AD superior to GBLUP-A.

Conclusion
Our findings on model adjustment parameters and predictive capabilities show that genomic methods are superior to conventional BLUP for the juçaizeiro. The superiority of genomic prediction is further confirmed by the fact that it allows a more in-depth study of the genetic control of traits, which is not possible by BLUP due to the complete absence of knowledge of kinship between individuals. To generate such information, this method would require great effort and financial resources, in addition to the time that would easily exceed 15 years of research.
For all traits but LDS, which presented variance due to dominance equal to zero, the models that considered only the additive effects presented estimates of the variance components and genetic parameters greater than the GBLUP-AD model, indicating that including the dominance effect corrects overestimations, returning more real estimates. Thus, genetic gains can be estimated with greater accuracy from the results generated by the GBLUP-AD model.
Among the methods and models tested (BLUP, GBLUP-A, and GBLUP-AD), GBLUP-AD had the best results, demonstrating the importance of estimating the dominance effect for productive traits in juçaizeiro.

Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request and with permission of W agency.